结直肠癌(CRC)是全球癌症和与癌症有关的死亡率最常见的原因之一。及时进行结肠癌筛查是早期发现的关键。结肠镜检查是用于诊断结肠癌的主要方式。然而,息肉,腺瘤和晚期腺瘤的错率仍然很高。在癌前阶段对息肉的早期发现可以帮助减少死亡率和与结直肠癌相关的经济负担。基于深度学习的计算机辅助诊断(CADX)系统可能会帮助胃肠病学家识别可能遗漏的息肉,从而提高息肉检测率。此外,CADX系统可能被证明是一种具有成本效益的系统,可改善长期结直肠癌的预防。在这项研究中,我们提出了一种基于学习的深度架构,用于自动息肉分割,称为变压器resu-net(Transresu-net)。我们提出的架构建立在带有Resnet-50作为骨架的残留块上,并利用变压器自我发项机制以及扩张的卷积。我们对两个公开息肉分割基准数据集的实验结果表明,Transresu-net获得了高度有希望的骰子得分和实时速度。在我们的性能指标中,我们得出的结论是,Transresu-NET可能是建立实时息肉检测系统的强大基准,用于早期诊断,治疗和预防结直肠癌。拟议的transun-net的源代码可在https://github.com/nikhilroxtomar/transresunet上公开获得。
translated by 谷歌翻译
视频胶囊内窥镜检查是计算机视觉和医学的热门话题。深度学习会对视频胶囊内窥镜技术的未来产生积极影响。它可以提高异常检测率,减少医生的筛查时间并有助于实际临床分析。视频胶囊内窥镜检查的CADX分类系统已显示出进一步改进的巨大希望。例如,检测癌性息肉和出血会导致快速的医疗反应并提高患者的存活率。为此,自动化的CADX系统必须具有较高的吞吐量和不错的精度。在本文中,我们提出了焦距,这是一个与轻量级卷积层集成的焦点调制网络,用于分类小肠解剖学地标和腔内发现。 FocalConvnet利用焦点调制以实现全球环境,并允许在整个正向通行证中进行全局本地空间相互作用。此外,具有固有的感应/学习偏置和提取分层特征的能力的卷积块使我们的焦点concalconvnet能够获得高吞吐量的有利结果。我们将焦点vnet与Kvasir-Capsule上的其他SOTA进行比较,Kvasir-Capsule是一个具有44,228帧的大型VCE数据集,具有13类不同的异常。我们提出的方法分别超过了其他SOTA方法论,加权F1得分,回忆和MCC}分别超过了其他SOTA方法。此外,我们报告了在实时临床环境中建立焦距的148.02图像/秒速率的最高吞吐量。建议的focalConvnet的代码可在https://github.com/noviceman-prog/focalconvnet上获得。
translated by 谷歌翻译
通过结肠镜检查检测和去除癌前息肉是预防全球结直肠癌的主要技术。然而,内镜医生的结直肠息肉率差异很大。众所周知,计算机辅助诊断(CAD)系统可以帮助内窥镜检测结肠息肉并最大程度地减少内镜医生之间的变化。在这项研究中,我们介绍了一种新颖的深度学习体系结构,称为{\ textbf {mkdcnet}},以自动息肉分割鲁棒性,以鲁棒性数据分布的重大变化。 MKDCNET只是一个编码器decoder神经网络,它使用预先训练的\ textIt {resnet50}作为编码器和小说\ textit {多个内核扩张卷积(MKDC)}块,可以扩展更多的观点,以了解更多强大的和异性的表示形式。对四个公开息肉数据集和细胞核数据集进行的广泛实验表明,当在从不同分布中对未见息肉数据进行测试时,在对同一数据集进行训练和测试时,所提出的MKDCNET在同一数据集进行训练和测试时,超出了最先进的方法。取得丰富的结果,我们证明了拟议的建筑的鲁棒性。从效率的角度来看,我们的算法可以在RTX 3090 GPU上以每秒($ \ of45 $)帧进行处理。 MKDCNET可能是建造临床结肠镜检查实时系统的强大基准。建议的MKDCNET的代码可在\ url {https://github.com/nikhilroxtomar/mkdcnet}上获得。
translated by 谷歌翻译
Deep Reinforcement Learning (DRL) has the potential to be used for synthesizing feedback controllers (agents) for various complex systems with unknown dynamics. These systems are expected to satisfy diverse safety and liveness properties best captured using temporal logic. In RL, the reward function plays a crucial role in specifying the desired behaviour of these agents. However, the problem of designing the reward function for an RL agent to satisfy complex temporal logic specifications has received limited attention in the literature. To address this, we provide a systematic way of generating rewards in real-time by using the quantitative semantics of Signal Temporal Logic (STL), a widely used temporal logic to specify the behaviour of cyber-physical systems. We propose a new quantitative semantics for STL having several desirable properties, making it suitable for reward generation. We evaluate our STL-based reinforcement learning mechanism on several complex continuous control benchmarks and compare our STL semantics with those available in the literature in terms of their efficacy in synthesizing the controller agent. Experimental results establish our new semantics to be the most suitable for synthesizing feedback controllers for complex continuous dynamical systems through reinforcement learning.
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.
translated by 谷歌翻译
Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
We propose an ensemble approach to predict the labels in linear programming word problems. The entity identification and the meaning representation are two types of tasks to be solved in the NL4Opt competition. We propose the ensembleCRF method to identify the named entities for the first task. We found that single models didn't improve for the given task in our analysis. A set of prediction models predict the entities. The generated results are combined to form a consensus result in the ensembleCRF method. We present an ensemble text generator to produce the representation sentences for the second task. We thought of dividing the problem into multiple small tasks due to the overflow in the output. A single model generates different representations based on the prompt. All the generated text is combined to form an ensemble and produce a mathematical meaning of a linear programming problem.
translated by 谷歌翻译